{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyOo/yJhRQYcANIK1Z8U9k10",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sugarforever/LangChain-Advanced/blob/main/06_Web_Research_Retriever.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 06 Web Research Retriever\n"
      ],
      "metadata": {
        "id": "cUbNOf6aAD2w"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Example"
      ],
      "metadata": {
        "id": "ZCMb8QtOBawT"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xPWZLTgY5E_4"
      },
      "outputs": [],
      "source": [
        "!pip install -q -U langchain openai chromadb tiktoken"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "os.environ['OPENAI_API_KEY'] = \"your valid openai api key\""
      ],
      "metadata": {
        "id": "eVQjKJn-5ZZZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.retrievers.web_research import WebResearchRetriever"
      ],
      "metadata": {
        "id": "Wui4jqtAOFol"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from langchain.vectorstores import Chroma\n",
        "from langchain.embeddings import OpenAIEmbeddings\n",
        "from langchain.chat_models.openai import ChatOpenAI\n",
        "from langchain.utilities import GoogleSearchAPIWrapper\n",
        "\n",
        "# Vectorstore\n",
        "vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory=\"./chroma_db_oai\")\n",
        "\n",
        "# LLM\n",
        "llm = ChatOpenAI(temperature=0)\n",
        "\n",
        "# Request from https://programmablesearchengine.google.com/controlpanel/all\n",
        "os.environ[\"GOOGLE_CSE_ID\"] = \"yor cse id\"\n",
        "# Request from https://developers.google.com/custom-search/v1/introduction\n",
        "os.environ[\"GOOGLE_API_KEY\"] = \"your google api key\"\n",
        "search = GoogleSearchAPIWrapper()"
      ],
      "metadata": {
        "id": "UNMFu8SvOOW-"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "search.run(\"What is vitamin?\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 128
        },
        "id": "6_eSDCxoOgpu",
        "outputId": "f499890c-9138-4da0-c1f9-e81b00e4336b"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "\"Dec 16, 2020 ... A vitamin is an organic compound, which means that it contains carbon. It is also an essential nutrient that the body may need to get from food. Sep 18, 2023 ... Total vitamin D intakes were three times higher with supplement use than with diet alone; the mean intake from foods and beverages alone for\\xa0... The two main forms of vitamin A in the human diet are preformed vitamin A (retinol, retinyl esters), and provitamin A carotenoids such as alpha-carotene and\\xa0... Nov 8, 2022 ... Vitamin D helps maintain strong bones. Learn how much you need, good sources, deficiency symptoms, and health effects here. Jan 19, 2023 ... Vitamins are a group of substances that are needed for normal cell function, growth, and development. There are 13 essential vitamins. This\\xa0... Aug 12, 2022 ... Vitamin A also helps your heart, lungs, and other organs work properly. Carotenoids are pigments that give yellow, orange, and red fruits and\\xa0... Aug 2, 2022 ... Vitamin D deficiency means that you don't have enough vitamin D in your body. It's common and primarily causes issues with your bones and\\xa0... Vitamin A, along with other vitamins, minerals and other compounds, is an essential micronutrient. This means that our bodies cannot manufacture it and\\xa0... Vitamin D is both a nutrient we eat and a hormone our bodies make. It is a fat-soluble vitamin that has long been known to help the body absorb and retain\\xa0... If you take vitamin A for its antioxidant properties, keep in mind that the supplement might not offer the same benefits as naturally occurring antioxidants in\\xa0...\""
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 18
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "web_research_retriever = WebResearchRetriever.from_llm(\n",
        "    vectorstore=vectorstore,\n",
        "    llm=llm,\n",
        "    search=search,\n",
        ")"
      ],
      "metadata": {
        "id": "jeT8hws8Op6V"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install -q -U html2text"
      ],
      "metadata": {
        "id": "EAUtzZ8bOxyr"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.chains import RetrievalQAWithSourcesChain\n",
        "user_input = \"Who is the winner of FIFA world cup 2002?\"\n",
        "qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)\n",
        "result = qa_chain({\"question\": user_input})\n",
        "result"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "BbuktT3-OsRu",
        "outputId": "8242c64e-6ca2-45f3-a3a0-aa58bbcd249c"
      },
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "INFO:langchain.retrievers.web_research:Generating questions for Google Search ...\n",
            "INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'Who is the winner of FIFA world cup 2002?', 'text': LineList(lines=['1. \"Which country won the FIFA World Cup in 2002?\"\\n', '2. \"Who was the champion of the FIFA World Cup held in 2002?\"\\n', '3. \"Who emerged as the victor in the FIFA World Cup 2002?\"'])}\n",
            "INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. \"Which country won the FIFA World Cup in 2002?\"\\n', '2. \"Who was the champion of the FIFA World Cup held in 2002?\"\\n', '3. \"Who emerged as the victor in the FIFA World Cup 2002?\"']\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': '2002 FIFA World Cup - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/2002_FIFA_World_Cup', 'snippet': 'However, the most potent team at the tournament, Brazil, prevailed, winning the final against Germany 2–0, making them the first and only country to have won\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': '2002 FIFA World Cup - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/2002_FIFA_World_Cup', 'snippet': \"... FIFA World Cup, the quadrennial football world championship for men's national teams organized by FIFA. It was held from 31 May to 30 June 2002 at sites in\\xa0...\"}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': 'FIFA World Cup Host Countries 2023', 'link': 'https://worldpopulationreview.com/country-rankings/fifa-world-cup-host-countries', 'snippet': 'France emerged as the final victor of the 2018 FIFA World Cup, which took place in Russia. ... 2002: Japan/South Korea; 1998: France (2nd time); 1994: United\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:New URLs to load: []\n",
            "INFO:langchain.retrievers.web_research:Grabbing most relevant splits from urls...\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'question': 'Who is the winner of FIFA world cup 2002?',\n",
              " 'answer': 'Brazil is the winner of the FIFA World Cup 2002.\\n',\n",
              " 'sources': 'https://en.wikipedia.org/wiki/2002_FIFA_World_Cup'}"
            ]
          },
          "metadata": {},
          "execution_count": 19
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import logging\n",
        "logging.basicConfig()\n",
        "logging.getLogger(\"langchain.retrievers.web_research\").setLevel(logging.INFO)\n",
        "user_input = \"What is Task Decomposition in LLM Powered Autonomous Agents?\"\n",
        "docs = web_research_retriever.get_relevant_documents(user_input)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "k8DHpRiMO--N",
        "outputId": "2fcb6704-61fd-4c76-8efa-76891293039c"
      },
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "INFO:langchain.retrievers.web_research:Generating questions for Google Search ...\n",
            "INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is Task Decomposition in LLM Powered Autonomous Agents?', 'text': LineList(lines=['1. How does task decomposition work in LLM powered autonomous agents?\\n', '2. What is the role of task decomposition in LLM powered autonomous agents?\\n', '3. Can you explain the concept of task decomposition in LLM powered autonomous agents?'])}\n",
            "INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How does task decomposition work in LLM powered autonomous agents?\\n', '2. What is the role of task decomposition in LLM powered autonomous agents?\\n', '3. Can you explain the concept of task decomposition in LLM powered autonomous agents?']\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'link': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'snippet': 'Jun 23, 2023 ... Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\" , \"What are the subgoals for achieving XYZ?\" , (2)\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'link': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'snippet': 'Jun 23, 2023 ... Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\" , \"What are the subgoals for achieving XYZ?\" , (2)\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'link': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'snippet': 'Jun 23, 2023 ... Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\" , \"What are the subgoals for achieving XYZ?\" , (2)\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:New URLs to load: []\n",
            "INFO:langchain.retrievers.web_research:Grabbing most relevant splits from urls...\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "import re\n",
        "from typing import List\n",
        "from langchain.chains import LLMChain\n",
        "from pydantic import BaseModel, Field\n",
        "from langchain.prompts import PromptTemplate\n",
        "from langchain.output_parsers.pydantic import PydanticOutputParser\n",
        "\n",
        "# LLMChain\n",
        "search_prompt = PromptTemplate(\n",
        "    input_variables=[\"question\"],\n",
        "    template=\"\"\"You are an assistant tasked with improving Google search\n",
        "    results. Generate 5 Google search queries that are similar to\n",
        "    this question. The output should be a numbered list of questions and each\n",
        "    should have a question mark at the end: {question}\"\"\",\n",
        ")\n",
        "\n",
        "class LineList(BaseModel):\n",
        "    \"\"\"List of questions.\"\"\"\n",
        "\n",
        "    lines: List[str] = Field(description=\"Questions\")\n",
        "\n",
        "class QuestionListOutputParser(PydanticOutputParser):\n",
        "    \"\"\"Output parser for a list of numbered questions.\"\"\"\n",
        "\n",
        "    def __init__(self) -> None:\n",
        "        super().__init__(pydantic_object=LineList)\n",
        "\n",
        "    def parse(self, text: str) -> LineList:\n",
        "        lines = re.findall(r\"\\d+\\..*?\\n\", text)\n",
        "        return LineList(lines=lines)\n",
        "\n",
        "llm_chain = LLMChain(llm=llm, prompt=search_prompt, output_parser=QuestionListOutputParser())"
      ],
      "metadata": {
        "id": "jNfh403xQZ2X"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Initialize\n",
        "web_research_retriever_llm_chain = WebResearchRetriever(vectorstore=vectorstore, llm_chain=llm_chain, search=search)\n",
        "\n",
        "# Run\n",
        "docs = web_research_retriever_llm_chain.get_relevant_documents(\"What is the recommended way to recycle plastics?\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "MXpFnjFDQjAe",
        "outputId": "7e8e2e86-41c0-432b-ccce-d1f4dc1248c3"
      },
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "INFO:langchain.retrievers.web_research:Generating questions for Google Search ...\n",
            "INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'What is the recommended way to recycle plastics?', 'text': LineList(lines=['1. How can I recycle plastics effectively?\\n', '2. What are the best practices for recycling plastics?\\n', '3. Which methods are recommended for recycling plastics?\\n', '4. What is the most efficient way to recycle plastics?\\n'])}\n",
            "INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. How can I recycle plastics effectively?\\n', '2. What are the best practices for recycling plastics?\\n', '3. Which methods are recommended for recycling plastics?\\n', '4. What is the most efficient way to recycle plastics?\\n']\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': '7 Tips to Recycle Better - Earth Day', 'link': 'https://www.earthday.org/7-tips-to-recycle-better/', 'snippet': \"Feb 25, 2022 ... Solution: Just as the rule states, make sure your recyclables are clean, empty and dry. It'll take seconds and if everyone did it, it would save\\xa0...\"}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': 'What Is Recycling & What to Recycle | WM', 'link': 'https://www.wm.com/us/en/recycle-right/recycling-101', 'snippet': 'What Can Be Recycled: Recycling Guide. Learn guidelines and common myths related to recycling. You can also download and print our Curbside Recycling Myths\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': 'Plastics recycling: challenges and opportunities - PMC', 'link': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873020/', 'snippet': 'Table 3 provides data on some environmental impacts from production of virgin commodity plastics ... ACRR 2004Good practices guide on waste plastics recycling\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:Searching for relevant urls...\n",
            "INFO:langchain.retrievers.web_research:Search results: [{'title': 'Frequent Questions on Recycling | US EPA', 'link': 'https://www.epa.gov/recycle/frequent-questions-recycling', 'snippet': 'May 29, 2023 ... Recycling just 10 plastic bottles saves enough energy to power a laptop for more than 25 hours. How does recycling save energy? When we make new\\xa0...'}]\n",
            "INFO:langchain.retrievers.web_research:New URLs to load: ['https://www.earthday.org/7-tips-to-recycle-better/', 'https://www.wm.com/us/en/recycle-right/recycling-101', 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873020/', 'https://www.epa.gov/recycle/frequent-questions-recycling']\n",
            "INFO:langchain.retrievers.web_research:Indexing new urls...\n",
            "Fetching pages: 100%|##########| 4/4 [00:01<00:00,  3.14it/s]\n",
            "INFO:langchain.retrievers.web_research:Grabbing most relevant splits from urls...\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "docs"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cpCR8J5gQs7J",
        "outputId": "bb7a014a-35ac-4085-cb20-1f51c5e131b0"
      },
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(page_content='_Solution: Just as the rule states, make sure your recyclables are clean,\\nempty and dry. It’ll take seconds and if everyone did it, it would save tons\\nof recyclables going to the landfill._\\n\\n## **4\\\\. Combined materials are trash**\\n\\nRecycling only works when like materials are together. Unfortunately, items\\nlike plastic-coated coffee cups, laminated paper and paper-bubble wrap\\nenvelopes from the mail can’t ever be separated, which means they’re trash.\\n\\n_Solution: Try to avoid buying nonrecyclable materials that can’t be\\nseparated. And when you can, shop local to cut down on the carbon footprint of\\nyour products._\\n\\n## **5\\\\. Know your plastics**\\n\\nNot all plastics are treated equally. Rigid plastics are recyclable, labeled\\nby resin codes 1 through 7. Generally, the higher the number, the less\\nrecyclable it is. Most recycling centers will recycle plastics 1 and 2 without\\na problem. Past that, it gets tricky.\\n\\nFurthermore, a lot of plastic just isn’t recyclable curbside. As noted\\nearlier, you can’t recycle plastic bags or films. Additionally, you can’t\\nrecycle anything that can tear like paper. That means no cracker bags, chip\\nbags or cereal bags.\\n\\n“With plastics, it does get so confusing,” says Erin Hafner of Baltimore’s\\nrecycling program. “Clamshell containers, cutlery, plastic straws — all that\\nstuff that ends up in the [recycling] bin.” And it shouldn’t.\\n\\n_Solution: Check your city’s recycling website for the number the city takes._\\n\\n## **6\\\\. Stop wishcycling**', metadata={'source': 'https://www.earthday.org/7-tips-to-recycle-better/'}),\n",
              " Document(page_content='Effective recycling of mixed plastics waste is the next major challenge for\\nthe plastics recycling sector. The advantage is the ability to recycle a\\nlarger proportion of the plastic waste stream by expanding post-consumer\\ncollection of plastic packaging to cover a wider variety of materials and pack\\ntypes. Product design for recycling has strong potential to assist in such\\nrecycling efforts. A study carried out in the UK found that the amount of\\npackaging in a regular shopping basket that, even if collected, cannot be\\neffectively recycled, ranged from 21 to 40% (Local Government Association (UK)\\n2007). Hence, wider implementation of policies to promote the use of\\nenvironmental design principles by industry could have a large impact on\\nrecycling performance, increasing the proportion of packaging that can\\neconomically be collected and diverted from landfill (see Shaxson _et al._\\n2009). The same logic applies to durable consumer goods designing for\\ndisassembly, recycling and specifications for use of recycled resins are key\\nactions to increase recycling.', metadata={'source': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873020/'}),\n",
              " Document(page_content='Recycling is one of the most important actions currently available to reduce\\nthese impacts and represents one of the most dynamic areas in the plastics\\nindustry today. Recycling provides opportunities to reduce oil usage, carbon\\ndioxide emissions and the quantities of waste requiring disposal. Here, we\\nbriefly set recycling into context against other waste-reduction strategies,\\nnamely reduction in material use through downgauging or product reuse, the use\\nof alternative biodegradable materials and energy recovery as fuel.\\n\\nWhile plastics have been recycled since the 1970s, the quantities that are\\nrecycled vary geographically, according to plastic type and application.\\nRecycling of packaging materials has seen rapid expansion over the last\\ndecades in a number of countries. Advances in technologies and systems for the\\ncollection, sorting and reprocessing of recyclable plastics are creating new\\nopportunities for recycling, and with the combined actions of the public,\\nindustry and governments it may be possible to divert the majority of plastic\\nwaste from landfills to recycling over the next decades.\\n\\n **Keywords:** plastics recycling, plastic packaging, environmental impacts,\\nwaste management, chemical recycling, energy recovery\\n\\n## 1\\\\. Introduction', metadata={'source': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873020/'}),\n",
              " Document(page_content='The effectiveness of post-consumer packaging recycling could be dramatically\\nincreased if the diversity of materials were to be rationalized to a subset of\\ncurrent usage. For example, if rigid plastic containers ranging from bottles,\\njars to trays were all PET, HDPE and PP, without clear PVC or PS, which are\\nproblematic to sort from co-mingled recyclables, then all rigid plastic\\npackaging could be collected and sorted to make recycled resins with minimal\\ncross-contamination. The losses of rejected material and the value of the\\nrecycled resins would be enhanced. In addition, labels and adhesive materials\\nshould be selected to maximize recycling performance. Improvements in\\nsorting/separation within recycling plants give further potential for both\\nhigher recycling volumes, and better eco-efficiency by decreasing waste\\nfractions, energy and water use (see §3). The goals should be to maximize both\\nthe volume and quality of recycled resins.\\n\\n## 9\\\\. Conclusions\\n\\nIn summary, recycling is one strategy for end-of-life waste management of\\nplastic products. It makes increasing sense economically as well as\\nenvironmentally and recent trends demonstrate a substantial increase in the\\nrate of recovery and recycling of plastic wastes. These trends are likely to\\ncontinue, but some significant challenges still exist from both technological\\nfactors and from economic or social behaviour issues relating to the\\ncollection of recyclable wastes, and substitution for virgin material.', metadata={'source': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873020/'})]"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    }
  ]
}